Creating Lexical Resources for Endangered Languages

نویسندگان

  • Khang Nhut Lam
  • Feras Al Tarouti
  • Jugal Kalita
چکیده

This paper examines approaches to generate lexical resources for endangered languages. Our algorithms construct bilingual dictionaries and multilingual thesauruses using public Wordnets and a machine translator (MT). Since our work relies on only one bilingual dictionary between an endangered language and an “intermediate helper” language, it is applicable to languages that lack many existing resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indigenous Languages of Indonesia: Creating Language Resources for Language Preservation

In this paper, we report a survey of language resources in Indonesia, primarily of indigenous languages. We look at the official Indonesian language (Bahasa Indonesia) and 726 regional languages of Indonesia (Bahasa Nusantara) and list all the available lexical resources (LRs) that we can gathered. This paper suggests that the smaller regional languages may remain relatively unstudied, and unkn...

متن کامل

Automatically Creating Multilingual Lexical Resources

The thesis proposes creating bilingual dictionaries and Wordnets for languages without many lexical resources using resources of resource-rich languages. Our work will have the advantage of creating lexical resources, reducing time and cost and at the same time improving the quality of resources created.

متن کامل

Time to change the “ D ” in “ DEL ”

The “D” in “DEL” stands for “documenting” – a code word for linguists that means the collection of linguistic data in audio and written form. The DEL (Documenting Endangered Languages) program run by the NSF and NEH is thus centered around building and archiving data resources for endangered languages. This paper is an argument for extending the ‘D’ to include “describing” languages in terms of...

متن کامل

LERIL: Collaborative Effort for Creating Lexical Resources

The paper reports on efforts taken to create lexical resources pertaining to Indian languages, using the collaborative model. The lexical resources being developed are: (1) transfer lexicon and grammar from English to several Indian languages, and (2) dependency tree bank of annotated corpora for several Indian languages. The dependency trees are based on the Paninian model. (3) is an attempt t...

متن کامل

Towards a Common Conceptual Framework of Language Documentation

Language represents shared conventionalization of concepts by all speakers. Hence language documentation preserves information far beyond a collection of sound shapes, lexical forms, and grammatical structures. The preservation of linguistically conventionalized conceptual structure is even more crucial for endangered language since this information is very often not available elsewhere. Howeve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014